3 research outputs found

    Implementasi Algoritma Penentuan Parameter Densitas Pada Metode Dbscan Untuk Pengelompokan Data

    Get PDF
    DBSCAN adalah salah satu metode klastering dengan konsep kerapatan data. Ketika data memiliki densitas beragam maka hasil pengklasteran DBSCAN tidak maksimal. Hal ini disebabkan nilai parameter densitas bersifat global untuk seluruh data. Implementasi tugas akhir ini menyelesaikan permasalahan tersebut menggunakan modifikasi DBSCAN sehingga nilai parameter densitas akan berbeda untuk setiap klaster. Nilai parameter densitas didapatkan dari hasil knearest neighbor beberapa data agar data terambil bukanlah noise atau outlier. Uji coba dilakukan dengan membandingkan hasil metode DBSCAN dengan DBSCAN yang telah dimodifikasi. Indikator keberhasilan uji coba menggunakan uji validitas klaster Indeks Dunn. Hasil uji coba validitas indeks menunjukkan bahwa DBSCAN modifikasi memiliki hasil klaster yang kurang baik dibandingkan hasil DBSCAN dengan nilai rata-rata Indeks Dunn 0.12 dan 0.146. Uji coba juga dilakukan dengan melihat label data dari kelas yang dihasilkan dengan kelas groundtruth. Pada uji coba ini DBSCAN modifikasi dapat mengidentifikasi hasil klaster yang lebih mirip dengan data aslinya dibanding dengan hasil DBSCAN tanpa modifikasi. =============================================================================================== DBSCAN is a clustering algorithm based on density concept. DBSCAN clustering results could not be optimal if data have a variation of densities level because density parameter values applied for the entire data clusters. Our implementation resolved the problems using a modified DBSCAN so that the density parameter values will be different for each cluster. Density parameter values are obtained from the k-nearest neighbor implementation in some data to recognize data outliers. Our experiments were comparing clustering results of DBSCAN and modified DBSCAN algorithms. We used Dunn Index as cluster validity measures. The results showed that Dunn Index values of modified DBSCAN were not better compared to the results of standard DBSCAN with Dunn Index of 0.12 and 0.146 respectively. However our experiments also compared data label of clustering results with label in groundtruth dataset. Labelling experiments showed that clustering results of modified DBSCAN algorithms had more similar label with ground-truth dataset

    Ekstraksi Fitur Conflict of Interest pada Artikel Ilmiah Untuk Menentukan Kualitas Citation Author

    Get PDF
    Sitasi pada publikasi ilmiah mempengaruhi kualitas artikel sehingga akanberpengaruh terhadap kredibilitas author (peneliti). Terda pat banyak cara untuk meningkatkan kredibilitas peneliti, salah satunya adalah dengan melakukan sitasi terhadap diri sendiri (self citation). Namun, proses self citation yang berlebihan mengurangi kualitas sitasi paper tersebut. Terdapat banyak penelitian yang membuat metode untuk mengukur kualitas self-citation yang tidak sesuai, salah satunya dengan menggunakan rasio self-citation pada jendela waktu. Akan tetapi, metode ini tidak mempertimbangkan kesesuaian topik penelitian paper utama terhadap paper yang mensitasinya. Sehingga diperlukan adanya penentuan kualitas sitasi pada author agar dapat diketahui apakah peneliti sering meggunakan citation yang tidak sesuai topiknya berdasarkan paper author dan paper sitasi. Penelitian ini mengusulkan metode ekstraksi fitur conflict of interest untuk menentukan kualitas citation penulis artikel ilmiah. Hal ini dilakukan untuk mengetahui seberapa baik peneliti dalam menggunakan sitasinya. Terdapat 2 fitur yang diusulkan dalam penelitian ini. Pertama, fitur confict of interest yang didapatkan dari konflik kepentingan antara author paper dan author paper yang disitasi. Kedua, fitur similaritas konten yaitu fitur yang didapatkan dari kesamaan topik antar dokumen paper dan yang disitasinya. Metode similaritas yang digunakan adalah salah satu pendekatan deep learning yaitu Siamese Neural Network yang dikombinasikan dengan Long Short Term Memory. Kedua fitur ini selanjutnya diklasifikasi untuk menentukan kualitas citation author. Seluruh fitur akan diuji performanya pada proses klasifikasi. Hasil klasifikasi selanjutnya akan dihitung nilai akurasinya untuk mendapatkan performa fitur yang diusulkan. Hasil uji coba menunjukkan bahwa usulan fitur dapat digunakan untuk mengklasifikasi kualitas sitasi author. Hal ini ditunjukkan dengan nilai akurasi sebesar 66.67% pada klasifikasi Random Forest dan rata-rata akurasi sebesar 62% pada 3 klasifikasi yang digunakan. =================================================================================================== Citation on scientific paper affect on article quality so that it will affect on author credibility. There are many ways to increase the credibility of researchers, one of them is to do a self-citation. However, this process makes the calculation in bibliometric becoming less accurate because it doesn’t consider citation quality. There is some studies that proposed a method to measure an inappropriate self-citation, one of them is using self-citation ratio. But, this method doesnt consider topic relatedness between main paper and cited paper. So, its required to determine author’s citation quality to know that author are using anomalous citation based on main paper and each cited paper. This research proposed feature extraction conflict of interest to detect author’s citation quality. It allows us to know how right an author use citation in publication. Two features are proposed in this research. First, conflict of interest feature, is obtained from interest conflict between paper author and citation’s paper author. Second, content similarity feature, is obtained from the similarity between paper and cited papers of author. Deep learning approach is used to get the similarity of each document. Combination of Siamese neural network and Long Short-Term Memory can provide a better result on similarity based on training data. Last, all features will be combined with self-citation’s count feature based on previous research and classified to detect author’s citation quality. Features will be tested for its performance using classification. From the classification results, accuracy will be calculated to obtain the performance of the proposed feature. Based on the result, proposed feature can be used to classify author’s citation quality. It is shown with 66,67% of accuracy by using Random Forest classification and 62% of average accuracy on 3 classifier

    Siamese Long Short-Term Memory for Detecting Conflict of Interest on Scientific Papers

    Get PDF
    Scientific articles cited by other researchers have an impact on increasing author credibility. However, the citation process may be misused to unnaturally raise a bibliometric indicator value such as researcher’s h-index. Researchers may overly cites their own works, referred as self-citation, even though the topic of the references are not related to the current article. Further misconduct is excessive citations on the works of peoples related to the researcher which can be coercive or not, referred as conflict of interest (CoI). The proposed method uses a deep learning approach, Siamese Long ShortTerm Memory (LSTM), to recognize subject similarities between a scientific article and its references. Standard text similarity fails to do so because contextual relatedness of sentences in the articles need some learning process. Siamese-LSTM learns contextual relatedness of sentences in the article using two identical LSTM. Steps of the proposed method are (i) word-embedding to get weight values of terms but still considers their semantic relations, (ii) k-means clustering to generate training data for reducing time complexity in Siamese-LSTM learning of scientific articles, (iii) learns Siamese-LSTM weight from training data to identify contextual relatedness of sentences, (iv) calculate similarity of a scientific article with its references based on Siamese-LSTM. The empirical experiments are used to analyze similarity values and the possibility for conflict of interest in an article
    corecore